Remaining case where reltuples can become distorted across multiple VACUUM operations

  • Jump to comment-1
    pg@bowt.ie2022-08-06T00:39:56+00:00
    My bugfix commit 74388a1a (which was pushed back in February) added heuristics to VACUUM's reltuples calculation/estimate. This prevented VACUUM from distorting our estimate of reltuples over time, across successive VACUUM operations run against the same table. The problem was that VACUUM could scan the same single heap page again and again, while believing it saw a random sample each time. This eventually leads to a pg_class.reltuples value that is based on the assumption that every single heap page in the table is just like the heap page that gets "sampled" again and again. This was always the last heap page (due to implementation details related to the work done by commit e8429082), which in practice tend to be particularly poor representations of the overall reltuples density of tables. I have discovered a gap in these heuristics: there are remaining cases where its percentage threshold doesn't prevent reltuples distortion as intended. It can still happen with tables that are small enough that a cutoff of 2% of rel_pages is less than a single page, yet still large enough that vacuumlazy.c will consider it worth its while to skip some pages using the visibility map. It will typically skip all but the final heap page from the relation (same as the first time around). Here is a test case that shows how this can still happen on HEAD (and in Postgres 15): regression=# create table foo(bar int);insert into foo select i from generate_series(1, 10000) i; CREATE TABLE INSERT 0 10000 Now run vacuum verbose against the table several times: regression=# vacuum verbose foo; *** SNIP *** regression=# vacuum verbose foo; The first vacuum shows "tuples: 0 removed, 10000 remain...", which is correct. However, each subsequent vacuum verbose revises the estimate downwards, eventually making pg_class.reltuples significantly underestimate tuple density (same as the first time around). Attached patch fixes closes the remaining gap. With the patch applied, the second and subsequent vacuum verbose operations from the test case will show that reltuples is still 10000 (it won't ever change). The patch just extends an old behavior that was applied when scanned_pages == 0 to cases where scanned_pages <= 1 (unless we happened to scan all of the relation's tables, of course). It doesn't remove the original test from commit 74388a1a, which still seems like a good idea to me. -- Peter Geoghegan
    • Jump to comment-1
      pg@bowt.ie2022-08-08T14:51:51+00:00
      On Fri, Aug 5, 2022 at 5:39 PM Peter Geoghegan <pg@bowt.ie> wrote: > Attached patch fixes closes the remaining gap. With the patch applied, > the second and subsequent vacuum verbose operations from the test case > will show that reltuples is still 10000 (it won't ever change). The > patch just extends an old behavior that was applied when scanned_pages > == 0 to cases where scanned_pages <= 1 (unless we happened to scan all > of the relation's tables, of course). My plan is to commit this later in the week, barring objections. Maybe on Thursday. -- Peter Geoghegan
      • Jump to comment-1
        boekewurm+postgres@gmail.com2022-08-08T15:14:08+00:00
        On Mon, 8 Aug 2022 at 16:52, Peter Geoghegan <pg@bowt.ie> wrote: > > On Fri, Aug 5, 2022 at 5:39 PM Peter Geoghegan <pg@bowt.ie> wrote: > > Attached patch fixes closes the remaining gap. With the patch applied, > > the second and subsequent vacuum verbose operations from the test case > > will show that reltuples is still 10000 (it won't ever change). The > > patch just extends an old behavior that was applied when scanned_pages > > == 0 to cases where scanned_pages <= 1 (unless we happened to scan all > > of the relation's tables, of course). > > My plan is to commit this later in the week, barring objections. Maybe > on Thursday. I do not have intimate knowledge of this code, but shouldn't we also add some sefety guarantees like the following in these blocks? Right now, we'll keep underestimating the table size even when we know that the count is incorrect. if (scanned_tuples > old_rel_tuples) return some_weighted_scanned_tuples; Kind regards, Matthias van de Meent
        • Jump to comment-1
          pg@bowt.ie2022-08-08T15:25:54+00:00
          On Mon, Aug 8, 2022 at 8:14 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > I do not have intimate knowledge of this code, but shouldn't we also > add some sefety guarantees like the following in these blocks? Right > now, we'll keep underestimating the table size even when we know that > the count is incorrect. > > if (scanned_tuples > old_rel_tuples) > return some_weighted_scanned_tuples; Not sure what you mean -- we do something very much like that already. We take the existing tuple density, and assume that that hasn't changed for any unscanned pages -- that is used to build a total number of tuples for the unscanned pages. Then we add the number of live tuples/scanned_tuples that the vacuumlazy.c caller just encountered on scanned_pages. That's often where the final reltuples value comes from. -- Peter Geoghegan
          • Jump to comment-1
            boekewurm+postgres@gmail.com2022-08-08T15:33:44+00:00
            On Mon, 8 Aug 2022 at 17:26, Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Aug 8, 2022 at 8:14 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > I do not have intimate knowledge of this code, but shouldn't we also > > add some sefety guarantees like the following in these blocks? Right > > now, we'll keep underestimating the table size even when we know that > > the count is incorrect. > > > > if (scanned_tuples > old_rel_tuples) > > return some_weighted_scanned_tuples; > > Not sure what you mean -- we do something very much like that already. > > We take the existing tuple density, and assume that that hasn't > changed for any unscanned pages -- that is used to build a total > number of tuples for the unscanned pages. Then we add the number of > live tuples/scanned_tuples that the vacuumlazy.c caller just > encountered on scanned_pages. That's often where the final reltuples > value comes from. Indeed we often apply this, but not always. It is the default case, but never applied in the special cases. For example, if currently the measured 2% of the pages contains more than 100% of the previous count of tuples, or with your patch the last page contains more than 100% of the previous count of the tuples, that new count is ignored, which seems silly considering that the vacuum count is supposed to be authorative. Kind regards, Matthias van de Meent
            • Jump to comment-1
              pg@bowt.ie2022-08-08T15:49:16+00:00
              On Mon, Aug 8, 2022 at 8:33 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > For example, if currently the measured 2% of the pages contains more > than 100% of the previous count of tuples, or with your patch the last > page contains more than 100% of the previous count of the tuples, that > new count is ignored, which seems silly considering that the vacuum > count is supposed to be authorative. The 2% thing is conditioned on the new relpages value precisely matching the existing relpages from pg_class -- which makes it very targeted. I don't see why scanned_tuples greatly exceeding the existing reltuples from pg_class is interesting (any more interesting than the other way around). We'll always accept scanned_tuples as authoritative when VACUUM actually scans all pages, no matter what. Currently it isn't possible for VACUUM to skip pages in a table that is 32 pages or less in size. So even the new "single page" thing from the patch cannot matter there. -- Peter Geoghegan
              • Jump to comment-1
                boekewurm+postgres@gmail.com2022-08-08T16:17:37+00:00
                On Mon, 8 Aug 2022 at 17:49, Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Aug 8, 2022 at 8:33 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > For example, if currently the measured 2% of the pages contains more > > than 100% of the previous count of tuples, or with your patch the last > > page contains more than 100% of the previous count of the tuples, that > > new count is ignored, which seems silly considering that the vacuum > > count is supposed to be authorative. > > The 2% thing is conditioned on the new relpages value precisely > matching the existing relpages from pg_class -- which makes it very > targeted. I don't see why scanned_tuples greatly exceeding the > existing reltuples from pg_class is interesting (any more interesting > than the other way around). Because if a subset of the pages of a relation contains more tuples than your current total expected tuples in the table, you should update your expectations regardless of which blocks or which number of blocks you've scanned - the previous stored value is a strictly worse estimation than your last measurement. > We'll always accept scanned_tuples as authoritative when VACUUM > actually scans all pages, no matter what. Currently it isn't possible > for VACUUM to skip pages in a table that is 32 pages or less in size. > So even the new "single page" thing from the patch cannot matter > there. A 33-block relation with first 32 1-tuple pages is still enough to have a last page with 250 tuples, which would be ignored in that scheme and have a total tuple count of 33 or so. Sure, this is an artificial sample, but you can construct similarly wrong vacuum samples: Two classes of tuples that have distinct update regimes, one with 32B-tuples and one with MaxFreeSpaceRequestSize-d tuples. When you start running VACUUM against these separate classes of updated blocks you'll see that the relation tuple count will also tend to 1*nblocks, due to the disjoint nature of these updates and the tendency to ignore all updates to densely packed blocks. With current code, we ignore the high counts of those often-updated blocks and expect low density in the relation, precisely because we ignore areas that are extremely dense and updated in VACUUM cycles that are different from bloated blocks. Kind regards, Matthias van de Meent
                • Jump to comment-1
                  pg@bowt.ie2022-08-08T16:48:18+00:00
                  On Mon, Aug 8, 2022 at 9:17 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > Because if a subset of the pages of a relation contains more tuples > than your current total expected tuples in the table, you should > update your expectations regardless of which blocks or which number of > blocks you've scanned - the previous stored value is a strictly worse > estimation than your last measurement. The previous stored value could be -1, which represents the idea that we don't know the tuple density yet. So it doesn't necessarily follow that the new estimate is strictly better, even in this exact scenario. > A 33-block relation with first 32 1-tuple pages is still enough to > have a last page with 250 tuples, which would be ignored in that > scheme and have a total tuple count of 33 or so. The simple fact is that there is only so much we can do with the limited information/context that we have. Heuristics are not usually free of all bias. Often the bias is the whole point -- the goal can be to make sure that we have the bias that we know we can live with, and not the opposite bias, which is much worse. Details of which are usually very domain specific. I presented my patch with a very simple test case -- a very clear problem. Can you do the same for this scenario? I accept that it is possible that we'll keep an old reltuples which is provably less accurate than doing something with the latest information from vacuumlazy.c. But the conditions under which this can happen are *very* narrow. I am not inclined to do anything about it for that reason. -- Peter Geoghegan